Increasing Software Reliability through Rollback and On-line Fault Repair

نویسندگان

Deepak Gupta

Pankaj Jalote

چکیده

In this paper, we propose a new paradigm for increasing the reliability of a software system by combining reactive and proactive approaches. The proposed approach employs rollback and restart for masking transient failures, and employs on-line software version change to remove faults from the software. A model for reliability analysis of a system employing the proposed approach is presented. The analysis shows that substantial benefit in reliability can be obtained by employing the proposed approach. A prototype system which incorporates the proposed approach is also described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empowering Software Debugging Through Architectural Support for Program Rollback

This paper proposes the use of processor support for program rollback, as a key primitive to enhance software debugging in production-run environments. We discuss how hardware support for program rollback can be used to characterize bugs on-the-fly, leverage code versioning for performance or reliability, sandbox device drivers, collect monitoring information with very low overhead, support fai...

متن کامل

Transient and Intermittent Fault Recovery without Rollback

Increasing chip density combined with heightened reliability expectations has spawned greater interest in fault tolerant design. In recent years, research into rollback and retry techniques has established them as an e ective approach to recovery from transient and intermittent faults. For applications with strict timing requirements, however, the high error latency inherent in retry approaches...

متن کامل

Encore: Low-Cost, Fine-Grained Transient Fault Recovery

To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. However, the pursuit of faster processors and longer battery life has come at the cost of device reliability. Given the rise of processor (un)reliability as a first-order design constraint, there has been a growing interest in low-cost, non-intrusive techniques...

متن کامل

System Reliability of Fault Tolerant Data Center Rev4

A single point of failure (SPOF) in system operations is a weak point of system reliability. Mean time to failure (MTTF) of system operations is equal to the shortage component’s MTTF in system. A Tier IV data center is designed to eliminate the SPOF. Data center system reliability is not only depended on the MTTF of each component in the system, but also relies on the mean time to repair (MTTR...

متن کامل

Double phase fault location in microgrids with the presence of electric vehicles and Distributed parameters line model

Nowadays, renewable energy is increasingly used in smart grids and microgrids to reduce the use of fossil fuels and improve network efficiency. Like all power system devices, microgrids are subject to transient and steady-state faults, such as short circuits. These faults impair reliability and consumer dissatisfaction. To accurately, automatically, and economically determine the location of a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Increasing Software Reliability through Rollback and On-line Fault Repair

نویسندگان

چکیده

منابع مشابه

Empowering Software Debugging Through Architectural Support for Program Rollback

Transient and Intermittent Fault Recovery without Rollback

Encore: Low-Cost, Fine-Grained Transient Fault Recovery

System Reliability of Fault Tolerant Data Center Rev4

Double phase fault location in microgrids with the presence of electric vehicles and Distributed parameters line model

عنوان ژورنال:

اشتراک گذاری